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ABSTRACT 



Recent observations by the Hubble Space Telescope of Cepheids in the Virgo cluster 
imply a Hubble Constant Ho = 80 ± 17 km/sec/Mpc. We attempt to clarify some issues 
of interpretation of these results for determining the global cosmological parameters fi 
and A. Using the formalism of Bayesian model comparison, the data suggest a universe 
with a nonzero cosmological constant A > 0, but vanishing curvature: fi + A = 1. 



Subject headings: Cosmology: theory - observation 

What have we really learned from the recent observations of Virgo Cepheids and their 
implications for the Hubble Constant? That is, what have we learned about global cosmology from 
these observations, Ho = 80 ± 17 km/sec/Mpc from HST (Freedman et al. 1994), or Ho = 87 ± 7 
km/sec/Mpc from CFHT (Pierce et al. 1994) combined with a lower limit on the age of the 
universe from stellar evolution of to ;> 12 Gyr? We shall use Bayesian statistics to hopefully give a 
precise answer to this question. 

Bayes' theorem states 

where p(a\bc) roughly means "the probability [density] of a given b and c. Here, 9 represents the 
parameters of the theory we are considering (or more precisely, the statement that the parameters 
lie in some range), D the outcome of some experiment, and / any background information. Then 
p(0\T) is the prior distribution for the parameters, p(D\9T) is the likelihood of the data, and p(D\T) 
is known as the evidence. Usually, this theorem is used to decide how the experiment effects 
our knowledge of the parameters of the theory. It can also be used in a more general context to 
compare theories and see which better explain the data (MacKay 1992). 

This Bayesian approach to theory-testing has the advantage that it automatically incorporates 
Ockham's razor, favoring the simpler theory unless the more complicated one is significantly better 
at explaining the data. We let j,k, . . . represent the different models and write the background 
information corresponding to each as / = Ij + 1\. + • • •, where + is "logical or". Then the 
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likelihood of model j is p(D\jI) = p(D\Ij), just the evidence for the model as defined above. In 
this formalism, the ratio of the probabilities of two models (i.e., the "odds" favoring one or the 
other) is given by 

pU\di) _ pU\i) I i»o / i/wny / :. 

p(k\DI) p(k\I) J d6 k p(0 k \h)p(D\0 k I k ) [ ' 

where 0j and Ij refer to the parameters and background information required for model j, and 
p(j\I) is the prior probability of the model. We shall concentrate on the second factor (known as 
the Bayes factor which contains the experimental information. A model is favored by this factor 
if the average of its likelihood with respect to the prior distribution is greater — if more of its 
parameter space is likely, given the data. Thus, if there are large areas of the allowed parameter 
space with very low likelihoods, the model as a whole may be disfavored, even if it contains a 
strongly favored maximum likelihood. 

The data that we consider are recent observations of the cepheids in the Virgo cluster, giving 
a value for the Hubble constant, and combinations of data and theory that give a lower limit 
on the age of the universe. We will consider the Hubble constant data to be represented by the 
likelihood 

p(D\H I) = N(H ,H,6H ) (3) 

and the information about the age of the universe by the prior (we could equally well write this as 
a likelihood for some other set of data) 

p(t \I) = L(t ,t,6t ) = \ + iarctan t^-^-) (4) 

2 7T V 0t J 

where N(x,fj,,a) gives the normal distribution with mean fj, and variance a 2 , and L(x,x,8x) gives 
a distribution with an approximate lower limit of x with some "slop" Sx (this distribution is not 
normalized). We consider the HST measurements, which give H ± SHo = 80 ± 17 km/sec/Mpc, 
and a lower limit on the age of the universe of / = 11.5 Gyr (Chaboyer 1994)with Sto — 0.5 Gyr. 

The age and Hubble constant are related to the cosmological parameters in an matter- 
dominated FRW universe by 

H t = imh-^- = f(tt,A) 

i 1 

dx , = (5) 

o Vl - ft - A + Six- 1 + Ax 2 

where Ho = 100/i km/s/Mpc, ft is the present density of non-relativistic matter and A is the 
present vacuum density in units of the critical density (see, for example, Kolb & Turner 1989). 
This simplifies to the usual familiar forms in the case of A = 0. In particular, Hoto = 2/3 for 
ft = 1,A = 0. 

In this case, we shall examine the following "theories," or classes of models: 



1. ft = 1, A = 0; 
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2. fi + A= 1; 

3. < < 1, A = 0; 

4. < < 1, < A < 1. 



Model 1 has no parameters (and so is the "simplest"), models 2 and 3 both have 1 parameter, and 
model 4 has two parameters. If we chose one of the models with a parameter, we may of course 
use the usual Bayesian techniques to find confidence intervals for fi and A. For now, we are not 
allowing for the possibility that fi > 1.) We have also not allowed any "cosmic variance," that 
is, the possibility that we live in a large underdense region and we are only measuring the local 
expansion rate. 

Now, we must combine this information using Bayes' theorem and the usual rules from the 
calculus of probabilities. The parameters, before marginalization, are all of Ho, to, 0, A, so 

p(HomA\DI) = Pi^^Ilp(D\HomA) 

P (H \t nAI) P (t \nAI) P (SlA\I) 



p(D\I) 

6(H -f(Sl,A)/t )L(t )p(SlA\I) 



p(D\T) 

Now, we marginalize over Hq and to: 



N(Ho,H,SHo), (6) 



p(ttA\I) = E p J^jy f dt ° L ^) N A )Ao, H, SH ] (7) 

Note that the integral in this expression is p(D\fI), the likelihood for / = i?o^o itself, or 
for fi and A through /(O, A). The result of this integral is a distribution cutting off to zero for 
/ ^ 80 km/sec/Mpc X 11.5 Gyr = 0.92, and rising roughly proportional to / for / ;> 0.92. Because 
of normalization problems, this distribution is difficult to calculate exactly; we will approximate it 
as 

p(D\fI) ~ fL (J, Hi, HSto + id Ho) . (8) 

This expression is correct in the limit SHo — ► 0, turning the normal distribution into a Dirac delta 
function. Naively, we might expect the likelihood for / to be this function without the factor of / 
in front; since the observations favor large / ~ 1, the distributions are numerically very similar. 

Finally, then, we have the posterior distribution 

P (SIA\DI) = P ^^ m, A)i (m, A), Hi, HSto + iSH ) . (9) 
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The different modeis we have enumerated above correspond to different vaiues of the prior 



6(Sl - l)S(A) modei 1 

S(tt + A-1) " 2 

6(A) " 3 

1 " 4 



:ioi 



For a given modei, we can now plot the posterior, or we can integrate over the posteriors to 
compare the models. In Figures 1-3, we show the unnormalized posterior for cases 2-4. The 
integral over the posterior needed for setting the odds is given in Table 1. The "best fit" model 
is + A = 1; the worst is the simplest with 0=1. The "most complicated" model 4 does quite 
well, but Figs. 3 shows that this is largely due to the large likelihood in the region with + A ;> 1; 
if we instead use a prior p(0A|J) = 20(0 + A — 1), defining model 4', the odds drop to 1:2 with 
respect to model 1. 



Model 


Odds 


1 = 1, A = 


1 


2 0+A= 1 


7.51 


3 < 1, A = 


2.66 


4 < 1, A < 1 


5.96 


4' + A<l, 


2.09 



Table 1: Odds ratios for various cosmological models 

In general, the data favor a low and/or a large A as expected from the data which prefer 
/ ^ 0.92. For the two one-parameter models, we can calculate limits on these parameters, which 
are shown in Table 2 for confidence levels of 68%, 95% and 99%, the usual 1-, 2-, and 3-sigma 
levels. Obviously, parameters very close to the = 1, A = point are ruled out, but not much 
more strongly than the prior information had already disfavored them. 



Significance 


< 1,A = 


+ A = 1 


68% 


< 0.38 


A > 0.74 


95% 


0.87 


0.30 


99% 


0.97 


0.07 



Table 2: Confidence Intervals for and A. 

So far, we have used a fairly conservative prior range for to in these calculations, only requiring 
that to ^ 11.5 Gyr. A less conservative (with respect to cosmology, not stellar evolution) limit 
might be to ;> 14 Gyr, perhaps with a larger "slop" 8to — 2 Gyr. When we perform the calculation 
with these values, none of the models easily produce values of /(0,A) > 1, so the posterior odds 




Fig. 1. — Unnormalized posterior distribution for A for Model 2: + A < 1. 
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Fig. 2. — Unnormalized posterior distribution for fi for Model 3: < 1, A = 0. 
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Fig. 3. — Unnormalized posterior distribution for fi and A for Model 4: 0, A < 1; contour spacing 
is 0.25 and lighter shades are more likely. 



- 8 - 



ratios relative to = 1, A = slightly decrease. If we make other possible changes, such as using 
the CFHT results of Ho = 87 ± 7 km/s/Mpc, the ordering of the theories in relative likelihood 
remains the same: + A = 1 remains the most likely theory and 0=1 the least. Since we have 
only approximated the likelihood function for / in Eq. (8), it is good to see that the results are 
not too strongly dependent on the details of the procedure. 

So what, in the end, does all this mean? We have quantified the conventional wisdom: the 
"best fit model" is one with + A = 1, favoring a cosmological constant A ;> 0.7. Alternately, a 
low is possible, although less strongly favored, requiring ^ 0.4, at least with current data. 
Recently, Leonard & Lake 1995 have performed a similar analysis, without the explicit emphasis 
on probabilistic methods, coming to the similar conclusion that < 1 and A > is the most 
likely interpretation of the current data. As our knowledge of the age of the universe and the 
Hubble constant increase (if the central values we have used here are indeed correct), cosmological 
parameters further from the simplest flat universe with no cosmological constant will be required. 
One novel feature of this analysis is its "automatic" use of Ockham's razor; for example, the model 
with + A < 1 is disfavored with respect to the one-parameter models, even though it includes 
them as a subset, because on average, it predicts smaller values for i?o^o- 
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